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1. INTRODUCTION 

The automatic parking system is one of the modern driver support services in the intelligent traffic 
system. The role of this system is to assist the driver in safely and quickly parking the vehicle [1], [2]. 
Therefore, this system can reduce the skill requirements of the driver and human-caused accidents such as car 
collisions. Modern driver assistance system technologies typically perform three base steps: object detection 
(obstacle), decision-making, and control [3]-[5]. The automatic parking system consists of three parts: parking 
environment recognition, route planning [6]-[8], and track tracking [9]-[11]. The parking system controller 
automatically obtains information about parking positions and obstacles through various sensors, such as 
ultrasonic sensors, cameras, wheel speed sensors, and angle sensors driving [12]-[15]. Sensors measure the 
vehicle's distance from obstacles, real-time visual data, current vehicle speed, and steering angle. Finally, the 
controller decides whether the autonomous vehicle should stop or continue from the information through the 
multifunction sensor [16], [17]. Path planning, control, and monitoring constantly interact in the automatic car 
parking system. In particular, the path monitoring control algorithm is one of the critical technologies of the 
automated parking system. This supervisory control algorithm must ensure the accuracy of road monitoring, 
driving comfort when the vehicle changes direction and the driving position and orientation at the end of the 
parking operation. Therefore, this automated parking technology attracts many scientists to research and 
propose related control algorithms in theory and experiment. 
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The document [18] proposed an automatic parking line control method that considers time delay, 
solving the problem of the control model of the traditional automated parking system that is not related to the 
parking system vehicle control delay. Another study [19] has proposed a semi-automatic parking assistance 
system based on the driver navigation area, which can recognize the environment information in real-time 
through the sensor to sense the environment and optimize the parking space. Optimize parking routes to avoid 
collisions. In another study [20], a fuzzy controller that supports automatic parking without a model was 
proposed to monitor the parking path. In addition, the improved research [21], [22] combines fuzzy control 
with neurons. This controller only needs to know the parking configuration. The vehicle will monitor the path 
and assist in correct parking. The algorithms can control and monitor the vehicle’s automatic path and parking 
on demand. However, these controllers cannot coordinate the vehicle speed and steering wheel control with 
the change of parking path during the vehicle’s movement. 

Therefore, the control solutions are still limited to the accuracy of the control and monitoring path. 
Thus, the article will propose a solution to apply a model predictive controller (MPC) using a vehicle dynamics 
model to predict how the vehicle will react to a particular control action within the expected range. This 
behavior is similar to the fact that a driver understands and predicts the behavior of a driver’s vehicle. To 
perform optimal control motion computation, this MPC controller needs to consider all input and output 
constraints on the system, such as speed limit, safe distance after, physical limit of vehicle, maximum steering 
angle, and obstacles for the controller to avoid [23]-[26]. This paper will present a controller design that 
combines MPC for cars to follow the reference path in the parking lot with the reinforcement learning method 
(RL) trained to perform the parking maneuver. The MPC controller moves the vehicle constantly along the 
reference path while the MPC algorithm searches for an empty parking spot. Once the MPC control algorithm 
has found the location, the RL controller will perform the parking request. This hybrid controller performs 
simultaneous obstacle detection and avoidance in tight parking spaces without human intervention. This system 
uses an adaptive model predictive controller that updates both the predictive model and the mixed input and 
output constraints at each control interval. The correctness of the theory is proven through MATLAB 
simulation. 

The article is expressed in five parts. The first part introduces the study of automatic parking vehicle 
control. In the next part, the mathematical model of the car is given. Based on this mathematical model, an 
MPC controller combined with RL for vehicle movement and obstacle avoidance in section 3. The correctness 
of the control solution is shown through the proof. MATLAB simulation in section 4. Finally, the paper makes 
conclusions about the main features of the automatic parking solution and future research directions. 


2. MATHEMATICAL MODEL OF CAR 

In Figure 1, the article employs a rectangular automobile model with dimensions of 5 meters in length 
and 2 meters in breadth. The vehicle can assist in overcoming obstacles using a Lidar sensor. This sensor 
calculates how far the car is from any obstacles in its lane and in front of it. Blocks might be stationary, like a 
giant pit, or moving, like a slowly driving car. The most frequent driver behavior is briefly switching lanes, 
crossing an obstruction, and retracing their steps. 

In Figure 1, itis noticed that the car coordinate model has four state variables such as x, y is the central 
position of the x, y-axis of the car; v is the speed of the vehicle; @ is the tilt angle of the car (value 0 when 
turning to the east, counterclockwise in the positive direction). Two variables interact such as T is the throttle 
(positive value when accelerating, negative when decelerating), and ó is the steering angle (value 0 when aligned 
with the car, positive counterclockwise); C; is the length of the vehicle. The paper uses a simple non-linear model 
to describe car dynamics as (1): 


Leftlane 


Etteral 


Previewed lane curvature 


Figure 1. Coordinate model of cars 
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x —cos(0)v 
y —sin(0)v (1) 
0 = (tan(6) C, ).v 
v = 0.5T 


According to Jacobian about the nonlinear state model used to build the linear predictive model at the operating 
point, the (2) is created: 


x = —vsin(0).0 + cos(0).v 
y —2v.cos(0).0 + sin(0)v (2) 


0 = (tan(6) C, ).v + (v(tan(6?) + 1)(tan(6) C) 


ù —0.5T 


3. MPC CONTROLLER DESIGN AND RL-PPO ADVANCED LEARNING 
3.1. MPC model predictive controller 

Model prediction controller uses object models, input and output noise to predict and estimate the 
state. The model structure used in the MPC controller is shown in Figure 2. The model prediction controller 
calculates the optimal control input by minimizing a cost function that penalizes deviations from the desired 


state trajectory. The predicted state is then used to update the control input in real time, allowing the controller 
to track the desired course accurately. 


object model 


wld(k) + 
input noise 
xld(k) 
wld(k) 
output noise 
Id(k 
nói ym(k) 
wn(k) measured | YP(k) MN 
noise 
xld(k) 
Figure 2. The MPC controller architecture 
3.1.1. Object model 
The car state model is written as the (3): 
Xp(k + 1) = Apx, (k) + BSius (k) (3) 


yp = Sg! Cx, (k) + Sj  DSiu,(k) 


Where: Xp, Yp is the input and output variable of the object; Ap, B, C are state space matrices with constant zero 
delay; S; is the input diagonal matrix; Sp is the output diagonal matrix; diagonal matrix of output scale factors; 
Xp is the state vector that includes all delay states; u, is a vector of input variables consisting of manipulated 
variables, measured noise, and unmeasured input noise; y; is a vector of output variables. State model (3) does 
not include input and output noise. So the car state model is rewritten as the (4): 
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Xp(k + 1) = ApXxp(k) + Bpu (k) + Boy (k) + Bpa (k) 
Yp (k) = CyXp (kK) Dou (k) + Dpv (k) + Dpa (k) (4) 


Where: C, = S5! C, Bow Bpv» Bpa is a parameter of BS;; Dyy, Dpv» Dpais a parameter of S5 ! DS;; (k), v(k), d(k) 
are the measured and unmeasured input noises. 

The MPC controller is limited so Dp, = 0 , means that the MPC controller does not allow direct 
transmission from any controlled variable to any output of the control object. Matrix A, B, C and D is determined 


as follows: 


Ap  BpaCia 0 0 Bp, Bpv BygDig 9 0 
A= 0 Aia 0 0 B= 0 0 Bia 0 0 
0 0 Aga OF 0 0 0 Bog 0 
0 0 0 A, 0 0 0 0 B, 
- C Dy 
C= Cp DpaCia Coa [o ]; D=0 Dov DpaDia Doa | 0 ] 


3.1.2. Input noise model 
The input noise model is determined by the (5): 


Xjg(k + 1) = Aiaxiq(k + 1) + Biawig(k + 1) (5) 
d(k) = CigXia(k) + Diawia(k) (6) 


In there: Aja, Big, Cia are constant state matrices; x;;(k) is the vector of the measured input noise when 
Dig 2 0 ; dy (k) is the vector of input noise ng can't measure; wj; is the input noise vector whose mean value 
is 0, when nj4 = 1. 


3.1.3. Output noise model 
The output noise model is determined by the (7): 


Xoa(k + 1) = AoaXoa(k + 1) + BoaWaolk + 1) (7) 
Yoa(k) = CoaXoalk) + DoaWoa (k) 


In there: Aoa, Bog, Cog, Dog are constant state matrices; Xoa (Kk) is the vector of the measured output noise when 
Nyoq = 0; d, (K) is the vector of the output noise n, can't measure; wog is the vector of input noise whose 
mean value is 0, when nog = 1. 


y 


3.1.4. Measured noise pattern 
The measured noise pattern is determined by the (7): 


Xn(k + 1) = Anxn(k + 1) + Bawn (k + 1) (8) 


Where: An, Bn» Coa are constant state matrices; x, (k) is the vector of the measured noise when nyn = 0 ; Yn (k) 
is the output noise vector Nym; Wn (k) is the input noise vector whose mean value is 0, when n, = 1. 


3.2. Reinforcement learning 

The structural principle of the reinforcement learning strategy is depicted in Figure 3. Machine 
learning's RL area investigates how an agent in a given environment should decide what behaviors to perform 
to maximize a particular reward over the long term. The RL algorithms seek a policy connecting the world's 
states to the actions the agent should do in each state. The RL algorithms used in this context are closely related 
to dynamic programming methods since the environment is often represented as a limited set of conditions. 
Unlike supervised learning, RL lacks good input/output pairings and does not explicitly assess near-optimal 
behaviors as true or false. Additionally, the action in question involves the pursuit of a balance between 
discovery (an untried condition) and exploitation (a known form). A set of “rewards” with no value is used to 
educate agents to execute a set of actions in a group of environmental conditions in the RL paradigm. The 
reward evaluates how well the last series of steps achieved the task goal. The agent has two parts, including a 
training algorithm and a policy. 
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Figure 3. Structure of RL 


3.2.1. Automated parking design 

The parking space for training is 22.5 m long and 20 m wide, with the target at the horizontal center. 
Shaft fault observation position X,, Y, of the car with the desired vehicle position. Value cos, sine true to the 
actual tilt angle 0 of the vehicle and the lidar sensor. To determine the distance of the ego vehicle from other 
cars in the environment. The lidar sensor is modeled using geometric relationships. The length of the Lidar 
sensor is measured along 12 radial lines from the center of the self-driving vehicle. When a line crosses an 
obstacle, Lidar shows the size of the block to the car. The space that can be measured along any road segment 
is 6 m. 

Parking speed is 2 m/s. The driving angle is limited from +45° to +15°. A vehicle is considered parked 
if the required position and posture errors are within the specified tolerances of +/-0.75 m (location) and +/-10 
degrees (direction). The process of stopping the training if the vehicle goes out of the bounds of the training parking 
area or collides with an obstacle, or is successfully parked. The reward at each time t is determined by (7): 


T, = 2e -(005X2*0.04Y2) 4 9.564062 _ 9.0562 + 100f, — 50g, (9) 


where: Xe, Yo, 0, and the errors in the position and angle of the car's inclination determined from the required 
position; ô is the steering angle; f, (0 and 1) indicates whether the vehicle is parked or not at the time t; g+ (0 and 1) 
indicates whether the vehicle collides with an obstacle at time t. 
Coordinate transformations when observing vehicle positions (X, Y, 0) of different parking spot 
locations are determined as follows: 
— . Coordinates 1-14: no conversion 
— Coordinates 15-22: X = Y, Y = X,08 = 0 —m/2 
— Coordinates 23-36: X = 100— X,Y = 60 -Y,ð0 = 0—m 
- Coordinates 37-40: X = 60 X, Y = X,8— 8 —77 
— Coordinates 41-52: X = 100 — X,Y = 30—-Y,02 0+7 
— Coordinates 53-64: X = X,Y = Y —- 28,0 = 0 


3.2.2. Augmented agent design 

The article proposes to design RL agents based on asymptotic proximal policy optimization (PPO). 
This is an online, model-free, gradient training method. This algorithm is a kind of policy gradient training that 
alternates between sampling the data through the environment interaction and optimizing the objective function 
using a random gradient function. The PPO RL agent is created by a neural network consisting of an input 
layer that receives information from the observer and an output layer. This neural network is trained empirically 
as agent training. The number of iterations steps is set to 200, and the number of training episodes is 150. The 
learning rate parameter of 0.2 improves the stability of the training. And a discount factor of 0.997 to maximize 
demand. The loss factor is 0.01. Calculate the output variance using the GAE advantage estimation method of 
0.95. Conduct PPO training as follows: train up to 10000 episodes, each lasting up to 200-time steps. Movement 
stops when maxing out the target average of 80 episodes or more. 
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4. RESULTS OF SIMULATION AND ASSESSMENT 

Automatic parking based on an MPC controller combined with RL-PPO is modeled according to the 
structure shown in Figure 4. They simulated the MPC controller with sampling time T; = 0.1. Output weight 
(2, 2, 3). The control variable constraints are [min là -5; max +5]. The obstacle in this paper is the assumption 
of an immobile object in the middle of the center lane of the same size car. The MPC controller matrices have 
the following values: 


100 0.1 0 100 0 0 
Ag=|0 1 O|; Bg =] 0 05c,7|0 1 01; Da —-|O 0 
0 0 1 0 0 0 0 1 0 0 


Vehicle Mode 


£9o Vehicle Model 


Parking Lot Simulator 
Lidar Sensor 


Figure 4. MATLAB simulation structure of automatic parking based on the MPC controller combined with 
RL-PPO 


The RL results are shown in Figure 5 and the automatic parking design results are in Figure 6. Figure 5 
shows that the average number of steps achieved through each episode of 80 random executions, the training time 
is 3551.2 seconds. Observation Figure 5 shows that in the first 200 episodes, the car only went under 20 steps. 
From the 200" to the 900" episode, the vehicle’s object avoidance continuously improved and increased step 
count. While the step maximum is reached starting with the 900" episode, this maximum is not always come 
in subsequent episodes. This level becomes more and more likely to go as the episode increases. The reasons 
for the results are: in the first episodes, the car did not know how to avoid static and dynamic obstacles, so it 
caused a very early collision. The show will stop the attack if a crash occurs and the vehicle speed is constant 
at 2 m/s. Therefore, the low step count corresponding to Figure 5 indicates that the vehicle is not responding 
well to obstacle avoidance. The more you train, that is, for more extensive episodes, the number of steps 
increases over time. This means that even though the vehicle moves continuously in an environment with static 
and dynamic obstacles, it can regulate the knowledge it has learned and make increasingly accurate decisions, 
avoiding the obstacles. Body. The step value reaches saturation at 1000 with increasing probability showing that 
the vehicle can operate well in a complex environment and achieve the maximum number of steps in future 
training times. This result proves that the algorithm has been installed successfully. With the Q-learning RL 
algorithm, the vehicle was able to train itself to achieve the skill of avoiding static and moving objects. 

Figure 6 shows the advantage of an RL controller the vehicle has moved along the path and is in the 
correct parking position. The elapsed time of the car is 10.8 s. However, the part of the parked car is still wrong 
on the Y axis (meaning the parked car is slanted). 

The response of the x and y positions and the tilt angle of the vehicle are shown in Figure 7. Through 
simulation results, the required location is (50.125; 4.9 -1.5709). Thus, the car reaches the target position within 
the allowable error of +/-0.75 m (site) and +/-10 degrees (direction) valid request. The assist feature helps the 
ego to stop after 10.8 seconds. The response of the driving angle is shown in Figure 8. From this simulation 
result, the steering angle shows that the controller reaches a steady state after about 4.2 seconds with a vehicle 
speed of 2 m/s by the requirements. 
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Episode Reward 


Episode Number 


Figure 5. Training process of RL-PPO 
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Figure 6. Automated parking lot model 
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Figure 7. x, y axis response and vehicle tilt angle 
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Figure 8. Driving angle response 


5. CONCLUSION 

This paper presents a successful controller design that combines the MPC model predictive control 
and RL method RL-PPO. This integrated controller has made the vehicle move to avoid obstacles and park the 
car as required with fast calculation time. The success of this research work has partly contributed to intelligent 
traffic systems, improving driver support services and traffic system management and administration agencies. 
However, to increase the convincingness and reliability of this smart control solution, the research work needs 
to be compared with other control methods such as deep learning RL (Q-deep learning), adaptive fuzzy tree 
(fuzzy tree), and the research results will be tested experimentally in the future. Furthermore, the comparison 
with other control methods will provide a comprehensive understanding of the strengths and weaknesses of the 
proposed integrated controller. Additionally, experimental testing in real-world scenarios will offer valuable 
insights into its practical applicability and performance under diverse conditions. This holistic approach will 
enhance the robustness and relevance of the intelligent control solution, thereby contributing to the 
advancement of intelligent transportation systems and autonomous vehicle technology. 
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